102 research outputs found
A Batch Rival Penalized Expectation-Maximization Algorithm for Gaussian Mixture Clustering with Automatic Model Selection
Within the learning framework of maximum weighted likelihood (MWL) proposed by Cheung, 2004 and 2005, this paper will develop a batch Rival Penalized Expectation-Maximization (RPEM) algorithm for density mixture clustering provided that all observations are available before the learning process. Compared to the adaptive RPEM algorithm in Cheung, 2004 and 2005, this batch RPEM need not assign the learning rate analogous to the Expectation-Maximization (EM) algorithm (Dempster et al., 1977), but still preserves the capability of automatic model selection. Further, the convergence speed of this batch RPEM is faster than the EM and the adaptive RPEM in general. The experiments show the superior performance of the proposed algorithm on the synthetic data and color image segmentation
Towards Label-free Scene Understanding by Vision Foundation Models
Vision foundation models such as Contrastive Vision-Language Pre-training
(CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot
performance on image classification and segmentation tasks. However, the
incorporation of CLIP and SAM for label-free scene understanding has yet to be
explored. In this paper, we investigate the potential of vision foundation
models in enabling networks to comprehend 2D and 3D worlds without labelled
data. The primary challenge lies in effectively supervising networks under
extremely noisy pseudo labels, which are generated by CLIP and further
exacerbated during the propagation from the 2D to the 3D domain. To tackle
these challenges, we propose a novel Cross-modality Noisy Supervision (CNS)
method that leverages the strengths of CLIP and SAM to supervise 2D and 3D
networks simultaneously. In particular, we introduce a prediction consistency
regularization to co-train 2D and 3D networks, then further impose the
networks' latent space consistency using the SAM's robust feature
representation. Experiments conducted on diverse indoor and outdoor datasets
demonstrate the superior performance of our method in understanding 2D and 3D
open environments. Our 2D and 3D network achieves label-free semantic
segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%,
respectively. And for nuScenes dataset, our performance is 26.8% with an
improvement of 6%. Code will be released
(https://github.com/runnanchen/Label-Free-Scene-Understanding)
Variation of peripheral pulse transit time with internal vascular pressure changes induced by arm movement
Pulse transit time (PTT) and blood pressure (BP) are widely used to quantify arterial characteristics. Arm position influences arterial BP and peripheral PTT. This study aims to quantify the relationship between PTT changes with internal vascular pressure variations induced by the arm moving. With left arm at horizontal position as reference and the right arm moving from 90 to 45, 0, −45, and −90° respectively, PTT difference was calculated by the difference of the pulse foot between right arm and left arm within the same heartbeat. The change in the BP was calculated from the gravitational effect with the measured arm length. Our results showed that the change in PTT with arm elevating is more obvious than that with arm lowering, indicating the different relationship between PTT changes due to the internal BP changes. This can help in understanding the inherent physiological/pathological mechanism of cardiovascular system
Rethinking Range View Representation for LiDAR Segmentation
LiDAR segmentation is crucial for autonomous driving perception. Recent
trends favor point- or voxel-based methods as they often yield better
performance than the traditional range view representation. In this work, we
unveil several key factors in building powerful range view models. We observe
that the "many-to-one" mapping, semantic incoherence, and shape deformation are
possible impediments against effective learning from range view projections. We
present RangeFormer -- a full-cycle framework comprising novel designs across
network architecture, data augmentation, and post-processing -- that better
handles the learning and processing of LiDAR point clouds from the range view.
We further introduce a Scalable Training from Range view (STR) strategy that
trains on arbitrary low-resolution 2D range images, while still maintaining
satisfactory 3D segmentation accuracy. We show that, for the first time, a
range view method is able to surpass the point, voxel, and multi-view fusion
counterparts in the competing LiDAR semantic and panoptic segmentation
benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.Comment: ICCV 2023; 24 pages, 10 figures, 14 tables; Webpage at
https://ldkong.com/RangeForme
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
Contrastive Language-Image Pre-training (CLIP) achieves promising results in
2D zero-shot and few-shot learning. Despite the impressive performance in 2D,
applying CLIP to help the learning in 3D scene understanding has yet to be
explored. In this paper, we make the first attempt to investigate how CLIP
knowledge benefits 3D scene understanding. We propose CLIP2Scene, a simple yet
effective framework that transfers CLIP knowledge from 2D image-text
pre-trained models to a 3D point cloud network. We show that the pre-trained 3D
network yields impressive performance on various downstream tasks, i.e.,
annotation-free and fine-tuning with labelled data for semantic segmentation.
Specifically, built upon CLIP, we design a Semantic-driven Cross-modal
Contrastive Learning framework that pre-trains a 3D network via semantic and
spatial-temporal consistency regularization. For the former, we first leverage
CLIP's text semantics to select the positive and negative point samples and
then employ the contrastive loss to train the 3D network. In terms of the
latter, we force the consistency between the temporally coherent point cloud
features and their corresponding image features. We conduct experiments on
SemanticKITTI, nuScenes, and ScanNet. For the first time, our pre-trained
network achieves annotation-free 3D semantic segmentation with 20.8% and 25.08%
mIoU on nuScenes and ScanNet, respectively. When fine-tuned with 1% or 100%
labelled data, our method significantly outperforms other self-supervised
methods, with improvements of 8% and 1% mIoU, respectively. Furthermore, we
demonstrate the generalizability for handling cross-domain datasets. Code is
publicly available https://github.com/runnanchen/CLIP2Scene.Comment: CVPR 202
Convolutional neural network based on photoplethysmography signals for sleep apnea syndrome detection
IntroductionThe current method of monitoring sleep disorders is complex, time-consuming, and uncomfortable, although it can provide scientifc guidance to ensure worldwide sleep quality. This study aims to seek a comfortable and convenient method for identifying sleep apnea syndrome.MethodsIn this work, a one-dimensional convolutional neural network model was established. To classify this condition, the model was trained with the photoplethysmographic (PPG) signals of 20 healthy people and 39 sleep apnea syndrome (SAS) patients, and the influence of noise on the model was tested by anti-interference experiments.Results and DiscussionThe results showed that the accuracy of the model for SAS classifcation exceeds 90%, and it has some antiinterference ability. This paper provides a SAS detection method based on PPG signals, which is helpful for portable wearable detection
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
Point-, voxel-, and range-views are three representative forms of point
clouds. All of them have accurate 3D measurements but lack color and texture
information. RGB images are a natural complement to these point cloud views and
fully utilizing the comprehensive information of them benefits more robust
perceptions. In this paper, we present a unified multi-modal LiDAR segmentation
network, termed UniSeg, which leverages the information of RGB images and three
views of the point cloud, and accomplishes semantic segmentation and panoptic
segmentation simultaneously. Specifically, we first design the Learnable
cross-Modal Association (LMA) module to automatically fuse voxel-view and
range-view features with image features, which fully utilize the rich semantic
information of images and are robust to calibration errors. Then, the enhanced
voxel-view and range-view features are transformed to the point space,where
three views of point cloud features are further fused adaptively by the
Learnable cross-View Association module (LVA). Notably, UniSeg achieves
promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes,
and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks,
including the LiDAR semantic segmentation challenge of nuScenes and panoptic
segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg
codebase, which is the largest and most comprehensive outdoor LiDAR
segmentation codebase. It contains most of the popular outdoor LiDAR
segmentation algorithms and provides reproducible implementations. The
OpenPCSeg codebase will be made publicly available at
https://github.com/PJLab-ADG/PCSeg.Comment: ICCV 2023; 21 pages; 9 figures; 18 tables; Code at
https://github.com/PJLab-ADG/PCSe
- …